[Bugfix][V1] Fix allowed_token_ids for v1 Sampler #14169

houseroad · 2025-03-04T01:35:23Z

Revert the one and zero, so masked_fill_ is applied appropriately.

Tested with 'test_allowed_token_ids' in #14159

Additional test:

    batched_output = xx.generate(
    ¦   [PROMPT, PROMPT],
    ¦   [
    ¦   ¦   SamplingParams(allowed_token_ids=allowed_token_ids),
    ¦   ¦   SamplingParams(),
    ¦   ]
    )
    assert batched_output[1].outputs[0].token_ids[-1] != TOKEN_ID

github-actions · 2025-03-04T01:35:34Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Lu Fang <[email protected]>

ywang96

LGTM! Thanks for fixing this and I left two minor comments!

ywang96 · 2025-03-04T04:19:56Z

vllm/v1/engine/processor.py

+        if params.allowed_token_ids is not None and len(
+                params.allowed_token_ids) == 0:


This can be just if not params.allowed_token_ids since if params.allowed_token_ids is None it would have already returned above.

ywang96 · 2025-03-04T04:30:51Z

vllm/v1/worker/gpu_input_batch.py

-                self.allowed_token_ids_mask = torch.zeros(self.max_num_reqs,
-                                                          self.vocab_size,
-                                                          dtype=torch.bool,
-                                                          device=self.device)
-                self.allowed_token_ids_mask_cpu_tensor = torch.zeros(
+                self.allowed_token_ids_mask = torch.ones(self.max_num_reqs,
+                                                         self.vocab_size,
+                                                         dtype=torch.bool,
+                                                         device=self.device)
+                self.allowed_token_ids_mask_cpu_tensor = torch.ones(
                    self.max_num_reqs,
                    self.vocab_size,
                    dtype=torch.bool,
                    device="cpu")
            self.allowed_token_ids_mask_cpu_tensor[req_index][
-                sampling_params.allowed_token_ids] = True
+                sampling_params.allowed_token_ids] = False


I think the updated logic here is a bit counter-intuitive to readers from a glance - can we add a NOTE here that all token ids with True mask will be filled with float("-inf") during sampling?

yeah, added some comments

njhill

Thanks @houseroad, looks good.

njhill · 2025-03-04T04:32:45Z

vllm/v1/engine/processor.py

+        if params.allowed_token_ids is not None and len(
+                params.allowed_token_ids) == 0:
+            raise ValueError("allowed_token_ids is not None and empty!")
+        if not all(0 <= tid < self.model_config.get_vocab_size()


Suggested change

if params.allowed_token_ids is not None and len(

params.allowed_token_ids) == 0:

raise ValueError("allowed_token_ids is not None and empty!")

if not all(0 <= tid < self.model_config.get_vocab_size()

if not params.allowed_token_ids:

raise ValueError("allowed_token_ids cannot be empty")

vocab_size = self.model_config.get_vocab_size()

if not all(0 <= tid < vocab_size

Make sense, addressed the comments

Signed-off-by: Lu Fang <[email protected]>

ywang96

Please check the discussion on slack regarding self.allowed_token_ids_mask - we need to take into account the situation where requests have different sampling params, which is fairly common in the online setting.

njhill · 2025-03-04T16:28:27Z

@liangfu yes thanks to @robertgshaw2-redhat for pointing out we only want to invert the mask for the applicable rows. So the tensors should still be initialized to zeros / reset to zeros when the request is removed, and when adding the requests, just fill the row to 1's before setting the allowed tokens to zero.

…ctly Signed-off-by: Lu Fang <[email protected]>

njhill · 2025-03-04T22:01:35Z

vllm/v1/worker/gpu_input_batch.py

@@ -359,7 +362,7 @@ def remove_request(self, req_id: str) -> Optional[int]:
        self.logit_bias[req_index] = None
        self.has_allowed_token_ids.discard(req_id)
        if self.allowed_token_ids_mask_cpu_tensor is not None:
-            self.allowed_token_ids_mask_cpu_tensor[req_index].fill_(False)
+            self.allowed_token_ids_mask_cpu_tensor[req_index].fill_(True)


@houseroad this should also be reverted right?

True, added more comments to help understand

…ctly Signed-off-by: Lu Fang <[email protected]>

njhill

Thanks @houseroad, LGTM now

ywang96

Thanks for the fix!

This was missed when merging vllm-project#14169 and vllm-project#14159 Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Johnny <[email protected]>

mergify bot added the v1 label Mar 4, 2025

Fix allowed_token_ids for v1 Sampler

a5457a6

Signed-off-by: Lu Fang <[email protected]>

houseroad force-pushed the fix_v1_allowed_token_ids branch from 2439e4a to a5457a6 Compare March 4, 2025 03:30

houseroad changed the title ~~Fix allowed_token_ids for v1 Sampler~~ [Bugfix] Fix allowed_token_ids for v1 Sampler Mar 4, 2025

houseroad marked this pull request as ready for review March 4, 2025 03:34

houseroad requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners March 4, 2025 03:34

houseroad mentioned this pull request Mar 4, 2025

[V1][Frontend] Add Testing For V1 Runtime Parameters #14159

Merged

ywang96 approved these changes Mar 4, 2025

View reviewed changes

njhill reviewed Mar 4, 2025

View reviewed changes

njhill changed the title ~~[Bugfix] Fix allowed_token_ids for v1 Sampler~~ [Bugfix][V1] Fix allowed_token_ids for v1 Sampler Mar 4, 2025

address comments

56cf0cc

Signed-off-by: Lu Fang <[email protected]>

ywang96 requested changes Mar 4, 2025

View reviewed changes

address comments to make sure we set the default value for mask corre…

830f784

…ctly Signed-off-by: Lu Fang <[email protected]>

njhill reviewed Mar 4, 2025

View reviewed changes

address comments to make sure we set the default value for mask corre…

661ecf3

…ctly Signed-off-by: Lu Fang <[email protected]>

njhill approved these changes Mar 5, 2025

View reviewed changes

ywang96 approved these changes Mar 5, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 5, 2025

ywang96 enabled auto-merge (squash) March 5, 2025 07:26

ywang96 merged commit 8d6cd32 into vllm-project:main Mar 5, 2025
47 checks passed

njhill added a commit to njhill/vllm that referenced this pull request Mar 5, 2025

[V1] Remove obsolete FIXME comment

7344a38

This was missed when merging vllm-project#14169 and vllm-project#14159 Signed-off-by: Nick Hill <[email protected]>

njhill mentioned this pull request Mar 5, 2025

[V1][Minor] Remove obsolete FIXME comment #14304

Merged

johnnynunez pushed a commit to johnnynunez/vllm that referenced this pull request Mar 6, 2025

[Bugfix][V1] Fix allowed_token_ids for v1 Sampler (vllm-project#14169)

4b16164

Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Johnny <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix][V1] Fix allowed_token_ids for v1 Sampler #14169

[Bugfix][V1] Fix allowed_token_ids for v1 Sampler #14169

houseroad commented Mar 4, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 4, 2025

ywang96 left a comment

ywang96 Mar 4, 2025

ywang96 Mar 4, 2025 •

edited

Loading

houseroad Mar 4, 2025

njhill left a comment

njhill Mar 4, 2025

houseroad Mar 4, 2025

ywang96 left a comment

njhill commented Mar 4, 2025

njhill Mar 4, 2025

houseroad Mar 4, 2025

njhill left a comment

ywang96 left a comment

		if params.allowed_token_ids is not None and len(
		params.allowed_token_ids) == 0:

[Bugfix][V1] Fix allowed_token_ids for v1 Sampler #14169

[Bugfix][V1] Fix allowed_token_ids for v1 Sampler #14169

Conversation

houseroad commented Mar 4, 2025 • edited by github-actions bot Loading

github-actions bot commented Mar 4, 2025

ywang96 left a comment

Choose a reason for hiding this comment

ywang96 Mar 4, 2025

Choose a reason for hiding this comment

ywang96 Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

houseroad Mar 4, 2025

Choose a reason for hiding this comment

njhill left a comment

Choose a reason for hiding this comment

njhill Mar 4, 2025

Choose a reason for hiding this comment

houseroad Mar 4, 2025

Choose a reason for hiding this comment

ywang96 left a comment

Choose a reason for hiding this comment

njhill commented Mar 4, 2025

njhill Mar 4, 2025

Choose a reason for hiding this comment

houseroad Mar 4, 2025

Choose a reason for hiding this comment

njhill left a comment

Choose a reason for hiding this comment

ywang96 left a comment

Choose a reason for hiding this comment

houseroad commented Mar 4, 2025 •

edited by github-actions bot

Loading

ywang96 Mar 4, 2025 •

edited

Loading